Skip to content

DOC: update the pandas.DataFrame.to_sql docstring #20126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Mar 13, 2018

Conversation

jazzmuesli
Copy link
Contributor

@jazzmuesli jazzmuesli commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:


################################################################################
##################### Docstring (pandas.DataFrame.to_sql)  #####################
################################################################################

Write records stored in a DataFrame to a SQL database.

This function inserts all rows of the dataframe into the given
 table and recreates it if if_exists='replace'. Databases supported by
 SQLAlchemy or DBAPI2 are also supported.

Parameters
----------
name : string
    Name of SQL table.
con : SQLAlchemy engine or DBAPI2 connection (legacy mode)
    Using SQLAlchemy makes it possible to use any DB supported by that
    library. If a DBAPI2 object, only sqlite3 is supported.
schema : string, default None
    Specify the schema (if database flavor supports this). If None, use
    default schema.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
    Accepted values:
    - fail: If table exists, do nothing.
    - replace: If table exists, drop it, recreate it, and insert data.
    - append: If table exists, insert data. Create if does not exist.
index : boolean, default True
    Write DataFrame index as a column.
index_label : string or sequence, default None
    Column label for index column(s). If None is given (default) and
    `index` is True, then the index names are used.
    A sequence should be given if the DataFrame uses MultiIndex.
chunksize : int, default None
    If not None, then rows will be written in batches of this size at a
    time.  If None, all rows will be written at once.
dtype : dict of column name to SQL type, default None
    Optional specifying the datatype for columns. The SQL type should
    be a SQLAlchemy type, or a string for sqlite3 fallback connection.

Returns
--------
    None

See Also
--------
pandas.read_sql_query : read a DataFrame from a table

Examples
--------
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///example.db', echo=False)
>>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
>>> # create a table from scratch with 3 rows
>>> df.to_sql('users', con=engine, if_exists='replace')
>>> df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})
>>> # 2 new rows inserted
>>> df1.to_sql('users', con=engine, if_exists='append')
>>> # table will be recreated and 5 rows inserted
>>> df = pd.concat([df, df1], ignore_index=True)
>>> df.to_sql('users', con=engine, if_exists='replace')
>>> pd.read_sql_query("select * from users",con=engine)
   index    name
0      0  User 1
1      1  User 2
2      2  User 3
3      3  User 4
4      4  User 5

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.to_sql" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.


Examples
--------
>>> import pandas as pd
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas import is not recommended in examples.

@@ -1865,17 +1865,21 @@ def to_sql(self, name, con, schema=None, if_exists='fail', index=True,
"""
Write records stored in a DataFrame to a SQL database.

This function inserts all rows of the dataframe into the given
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to know which databases are supported and how.

>>> gen_users = lambda ids: {"id": ids, "name" : gen_names(ids)}
>>> df=pd.DataFrame(gen_users(list(range(3))))
>>> # create a table from scratch
>>> df.to_sql('users',con=engine,if_exists='replace')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some example lines are not compatible with PEP8

>>> engine = create_engine('sqlite:///example.db', echo=False)
>>> gen_names = lambda ids: ["User" + str(x) for x in ids]
>>> gen_users = lambda ids: {"id": ids, "name" : gen_names(ids)}
>>> df=pd.DataFrame(gen_users(list(range(3))))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's a bit more straightforward to generate dataframe without other functions:
i.e. df=pd.DataFrame(['User 1', 'User 2', 'User 3'])

con : SQLAlchemy engine or DBAPI2 connection (legacy mode)
Using SQLAlchemy makes it possible to use any DB supported by that
library. If a DBAPI2 object, only sqlite3 is supported.
schema : string, default None
Specify the schema (if database flavor supports this). If None, use
default schema.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
Accepted values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I find it clearer without the line here, though I know the script complains when it isn't present :) cc @datapythonista

@@ -1892,6 +1897,29 @@ def to_sql(self, name, con, schema=None, if_exists='fail', index=True,
Optional specifying the datatype for columns. The SQL type should
be a SQLAlchemy type, or a string for sqlite3 fallback connection.

Returns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Returns section can be ommitted if ther's no return value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python scripts/validate_docstrings.py pandas.DataFrame.to_sql complains:
Errors found:
No returns section found


See Also
--------
pandas.io.sql.to_sql : this function will be called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that method is public. I think just link to pandas.read_sql

--------
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///example.db', echo=False)
>>> gen_names = lambda ids: ["User" + str(x) for x in ids]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you could simplify this? It's a bit dense.

Maybe just make a simple df, display it. then show the insert. And maybe show a pd.read_sql to view the result.

@pep8speaks
Copy link

pep8speaks commented Mar 10, 2018

Hello @jazzmuesli! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 12, 2018 at 22:15 Hours UTC

@@ -1865,17 +1865,22 @@ def to_sql(self, name, con, schema=None, if_exists='fail', index=True,
"""
Write records stored in a DataFrame to a SQL database.

This function inserts all rows of the dataframe into the given
table and recreates it if if_exists='replace'. Databases supported by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can add a link to SQLAlchemy in References

>>> engine = create_engine('sqlite://', echo=False)
>>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
>>> # create a table from scratch with 3 rows
>>> df.to_sql('users', con=engine, if_exists='replace')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add blank lines between cases
also for comments, don't use the leading '>>>'

@jreback jreback added Docs IO SQL to_sql, read_sql, read_sql_query labels Mar 10, 2018

See Also
--------
pandas.read_sql_query : read a DataFrame from a table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to pandas.read_sql

preich and others added 6 commits March 10, 2018 16:05
* Added more examples
* Reworded extended
* Reformed if_exists
* Added Raises
* Removed Returns
* Added DBABI2 ref
@TomAugspurger
Copy link
Contributor

Made some updates if anyone has a chance to review @jazzmuesli.

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 13, 2018
@TomAugspurger TomAugspurger merged commit 50b2184 into pandas-dev:master Mar 13, 2018
@TomAugspurger
Copy link
Contributor

Thanks @jazzmuesli !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants